fix(worker): Cap bundle analysis processor at 10 attempts and fix retry logic #688

drazisil-codecov · 2026-02-06T18:04:45Z

Cap the bundle analysis processor at 10 total attempts (not 10 retries + 1) and fix infinite retries when the broker re-delivers with unchanged headers.

What: BundleAnalysisProcessorTask and LockManager now treat the configured value as max total attempts (e.g. 10 = 10 attempts then stop). BaseCodecovTask._has_exceeded_max_attempts(max_attempts) compares attempts (from retries + 1 or headers["attempts"]) to max_attempts. LockManager uses a Redis-backed attempt counter for lock contention so re-deliveries without new headers don't retry forever. On generic exception in the processor we return and set the upload to error instead of re-raising, so the task is acknowledged and we avoid unbounded retries.

Why: Prevents runaway retries and aligns behavior with a clear "10 attempts then stop" cap; fixes cases where the broker re-delivers with the same headers so Celery's retry count doesn't increase.

Tests: Updated unit and integration tests: mock request for _has_exceeded_max_attempts, set mock_redis.incr.return_value where LockManager compares attempts, and adjusted the cleanup-with-None-result test to expect return + upload error instead of a raised ValueError.

Refs CCMRG-2042

Note

Medium Risk
Changes core retry/locking behavior and alters bundle analysis processor failure semantics (return vs raise), which could affect task completion and error propagation. Risk is mitigated by extensive unit/integration test updates and mostly scoped to worker retry paths.

Overview
Fixes runaway retries by redefining max_retries across worker tasks as maximum total attempts (stop when attempts >= max_retries) and updating BaseCodecovTask._has_exceeded_max_attempts/safe_retry and config comments accordingly.

LockManager now tracks lock-acquisition attempts via a Redis incr counter keyed per lock (with TTL) so tasks can stop retrying even when broker re-deliveries don’t bump Celery retry counts; it also clears this counter on successful lock usage and updates LockRetry/logs/Sentry context to report max_retries/attempts.

BundleAnalysisProcessorTask (and several other lock-using tasks like manual_trigger, preprocess_upload, upload_finisher, notifier/finisher tasks) now honor the lock manager’s max_retries_exceeded signal in addition to task attempt caps; the bundle analysis processor additionally refactors repeated logging/DB error-state commits and, on generic processing exceptions, sets the upload to error and returns instead of re-raising to avoid unbounded retries and keep chains progressing with previous_result.

Tests are updated to reflect Redis attempt counting (mock_redis.incr), the new attempt boundary semantics, and the changed processor behavior (returning previous_result/not raising in some failure cases).

^{Written by Cursor Bugbot for commit 5a7c73f. This will update automatically on new commits. Configure here.}

…ry logic Cap total attempts at 10 (not 10 retries + 1) for BundleAnalysisProcessorTask and LockManager so we stop after 10 tries. Add Redis-backed attempt counter in LockManager for lock contention so broker re-deliveries with unchanged headers do not retry indefinitely. BaseCodecovTask._has_exceeded_max_attempts now takes max_attempts and compares to attempts (retries + 1 or header). On generic exception in bundle processor, return and set upload to error instead of re-raising to avoid unbounded retries. Update tests: mock request for _has_exceeded_max_attempts, set mock_redis.incr.return_value where LockManager compares attempts, and adjust cleanup test to expect return instead of raised ValueError. Refs CCMRG-2042 Co-authored-by: Cursor <cursoragent@cursor.com>

linear · 2026-02-06T18:04:49Z

CCMRG-2042 Cap bundle analysis processor at 10 attempts and fix infinite retry

- LockManager: extract _clear_lock_attempt_counter to remove nested try in locked() - Upload finisher: log max_attempts as UPLOAD_PROCESSOR_MAX_RETRIES (not +1) - Lock_manager: comment TTL intent instead of restating 24h - Tests: remove hard-coded (10) from comments; use max_attempts wording Co-authored-by: Cursor <cursoragent@cursor.com>

- BaseCodecovTask: doc and safe_retry use max_retries; drop max_attempts property - LockRetry: max_attempts -> max_retries (same semantics: max total attempts) - LockManager/bundle_analysis_processor/upload_finisher: log and Sentry use max_retries - Tests: LockRetry(max_retries=...), comments say max_retries - celery_config: one-line convention (max_retries = max total attempts) - Fix duplicate dict keys in lock_manager and upload_finisher Refs CCMRG-2042 Co-authored-by: Cursor <cursoragent@cursor.com>

…ssor-at-10-attempts-and-fix Resolve lock_manager conflict: keep attempt counter and max_retries logic, use self.base_retry_countdown (from main) for countdown calculation. Co-authored-by: Cursor <cursoragent@cursor.com>

sentry · 2026-02-06T18:37:58Z

Codecov Report

❌ Patch coverage is 95.23810% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.36%. Comparing base (7e0f20b) to head (5a7c73f).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
apps/worker/services/lock_manager.py	88.88%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #688      +/-   ##
==========================================
- Coverage   92.37%   92.36%   -0.01%     
==========================================
  Files        1301     1301              
  Lines       47782    47793      +11     
  Branches     1613     1613              
==========================================
+ Hits        44139    44145       +6     
- Misses       3334     3339       +5     
  Partials      309      309

Flag	Coverage Δ
apiunit	`96.36% <ø> (ø)`
sharedintegration	`37.18% <ø> (ø)`
sharedunit	`85.25% <ø> (ø)`
workerintegration	`58.61% <59.52%> (+0.02%)`	⬆️
workerunit	`90.34% <95.23%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

codecov-notifications · 2026-02-06T18:38:02Z

Codecov Report

❌ Patch coverage is 95.23810% with 2 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
apps/worker/services/lock_manager.py	88.88%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

codspeed-hq · 2026-02-06T18:39:23Z

CodSpeed Performance Report

Merging this PR will not alter performance

_{Comparing joebecher/ccmrg-2042-cap-bundle-analysis-processor-at-10-attempts-and-fix (5a7c73f) with main (1c82cfe)¹}

Summary

✅ 9 untouched benchmarks

No successful run was found on main (7e0f20b) during the generation of this report, so 1c82cfe was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

apps/worker/services/lock_manager.py

…test LockManager uses redis incr for attempt count; mock must return an int so attempts >= max_retries does not raise TypeError. Refs CCMRG-2042 Co-authored-by: Cursor <cursoragent@cursor.com>

When LockManager's Redis attempt counter hits max_retries before the task's attempt count (e.g. re-deliveries), it raises LockRetry(max_retries_exceeded=True, countdown=0). ManualTriggerTask only checked self._has_exceeded_max_attempts(), so it fell through to self.retry(countdown=0) and caused rapid zero-delay retries. Align with preprocess_upload and other callers: check retry.max_retries_exceeded or self._has_exceeded_max_attempts() and return failure dict when either is true. Add test for the Redis-counter path. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-06T19:10:05Z

apps/worker/tasks/bundle_analysis_processor.py

+                        _set_upload_error_and_commit(
+                            db_session, upload, commitid, repoid, log_suffix=" fallback"
+                        )
+                    return previous_result


Inconsistent return bypasses defensive isinstance check

Low Severity

The retryable-error max-retries-exceeded path at line 258 returns previous_result directly, while the general exception path at line 295 returns processing_results. Since processing_results is defined as previous_result if isinstance(previous_result, list) else [], these diverge when previous_result is unexpectedly not a list — the general exception path safely returns [], but the retryable error path returns the raw non-list value, bypassing the defensive isinstance guard. The old code consistently used processing_results in both paths.

Additional Locations (1)

apps/worker/tasks/bundle_analysis_processor.py#L294-L295

drazisil-codecov and others added 3 commits February 6, 2026 13:13

drazisil-codecov marked this pull request as ready for review February 6, 2026 18:29

drazisil-codecov requested a review from jason-ford-codecov February 6, 2026 18:29

cursor bot reviewed Feb 6, 2026

View reviewed changes

apps/worker/services/lock_manager.py Show resolved Hide resolved

drazisil-codecov and others added 2 commits February 6, 2026 13:48

test(worker): set mock_redis.incr.return_value in BA retry countdown …

279dbe8

…test LockManager uses redis incr for attempt count; mock must return an int so attempts >= max_retries does not raise TypeError. Refs CCMRG-2042 Co-authored-by: Cursor <cursoragent@cursor.com>

cursor bot reviewed Feb 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(worker): Cap bundle analysis processor at 10 attempts and fix retry logic #688

fix(worker): Cap bundle analysis processor at 10 attempts and fix retry logic #688

Uh oh!

drazisil-codecov commented Feb 6, 2026 •

edited by cursor bot

Loading

Uh oh!

linear bot commented Feb 6, 2026

Uh oh!

sentry bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

codecov-notifications bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix(worker): Cap bundle analysis processor at 10 attempts and fix retry logic #688

Are you sure you want to change the base?

fix(worker): Cap bundle analysis processor at 10 attempts and fix retry logic #688

Uh oh!

Conversation

drazisil-codecov commented Feb 6, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear bot commented Feb 6, 2026

Uh oh!

sentry bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov-notifications bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging this PR will not alter performance

Summary

Footnotes

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 6, 2026

Choose a reason for hiding this comment

Inconsistent return bypasses defensive isinstance check

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

drazisil-codecov commented Feb 6, 2026 •

edited by cursor bot

Loading

sentry bot commented Feb 6, 2026 •

edited

Loading

codecov-notifications bot commented Feb 6, 2026 •

edited

Loading

codspeed-hq bot commented Feb 6, 2026 •

edited

Loading